--- title: Tutorial keywords: fastai sidebar: home_sidebar summary: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." description: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." nb_path: "notebooks/tutorial/00_DolphinsTutorial.ipynb" ---
{% raw %}
{% endraw %}

Please open this notebook in Colab to edit it and submit a solution:

Open In Colab

{% raw %}
try:
    import dolphins_recognition_challenge
except Exception:
    if "google.colab" in str(get_ipython()):
        print("Running on CoLab")
        !pip install dolphins-recognition-challenge
{% endraw %} {% raw %}
%load_ext autoreload
%autoreload 2
{% endraw %} {% raw %}
import numpy as np
import PIL
from PIL import Image

import torch
import torchvision
import pandas as pd
import seaborn as sns
{% endraw %}

Download data

We start by downloading and visualizing the dataset containing 200 photographs with one or more dolphins split into a training set containing 160 photographs and a validation set containing 40 photographs.

{% raw %}
from dolphins_recognition_challenge.datasets import get_dataset, display_batches
    
data_loader, data_loader_test = get_dataset("segmentation", batch_size=3)

display_batches(data_loader, n_batches=2)

{% endraw %}

Data augmentation

In order to prevent overfitting which happens when the dataset size is too small, we perform a number of transformations to increase the size of the dataset. One transofrmation implemented in the Torch vision library is RandomHorizontalFlip and we will implemented MyColorJitter which is basically just a wrapper around torchvision.transforms.ColorJitter class. However, we cannot use this class directly without a wrapper because a transofrmation could possibly affect targets and not just the image. For example, if we were to implement RandomCrop, we would need to crop segmentation masks and readjust bounding boxes as well.

{% raw %}
class MyColorJitter:
    def __init__(self, brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5):
        self.torch_color_jitter = torchvision.transforms.ColorJitter(
            brightness=brightness, contrast=contrast, saturation=saturation, hue=hue
        )

    def __call__(self, image, target):
        image = self.torch_color_jitter(image)
        return image, target
{% endraw %}

We will make a series of transformations on an image and we will combine all those transofrmations in a single one as follows:

{% raw %}
from dolphins_recognition_challenge.datasets import ToTensor, ToPILImage, Compose, RandomHorizontalFlip

def get_tensor_transforms(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(
            MyColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
        )
        transforms.append(RandomHorizontalFlip(0.5))
        # TODO: add additional transforms: e.g. random crop
    return Compose(transforms)
{% endraw %}

With data augementation defined, we are ready to generate the actual datasets used for training our models.

{% raw %}
batch_size = 4

data_loader, data_loader_test = get_dataset(
    "segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)

display_batches(data_loader, n_batches=4)
{% endraw %}

{% include tip.html content='incorporate more transformation classes such as RandomCrop etc. (https://pytorch.org/docs/stable/torchvision/transforms.html)' %}

Model

We can reuse already trained models for instance segmentation trained on other dataset and finetune it for our particular problem, in our case on dataset with dolphins.

{% raw %}
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

def get_instance_segmentation_model(hidden_layer_size, box_score_thresh=0.5):
    # our dataset has two classes only - background and dolphin    
    num_classes = 2
    
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(
        pretrained=True, 
        box_score_thresh=box_score_thresh, 
    )

    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels

    model.roi_heads.mask_predictor = MaskRCNNPredictor(
        in_channels=in_features_mask, 
        dim_reduced=hidden_layer_size,
        num_classes=num_classes
    )

    return model
{% endraw %}

Before using a model constructed, we should move it to appropriate device. We will test if we have GPU available and move it to there if possible.

{% raw %}
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# get the model using our helper function
model = get_instance_segmentation_model(hidden_layer_size=256)

# move model to the right device
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
{% endraw %}

We have implemented a function for training a model for one epoch - meaning using each image from the training dataset exactly once. Let's train for one epochs an see what predictions we make before and after that.

{% raw %}
data_loader, data_loader_test = get_dataset(
    "segmentation",
    batch_size=4,
    get_tensor_transforms=get_tensor_transforms,
    n_samples=8,
)
{% endraw %} {% raw %}
data_loader, data_loader_test = get_dataset(
    "segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)
{% endraw %} {% raw %}
from dolphins_recognition_challenge.instance_segmentation.model import train_one_epoch
from dolphins_recognition_challenge.instance_segmentation.model import show_predictions

show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)

num_epochs = 1

for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)

train_one_epoch(model, optimizer, data_loader, device, epoch=1, print_freq=20)

show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)
/root/.local/lib/python3.6/site-packages/torch/nn/functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
Epoch: [0]  [ 0/40]  eta: 0:01:20  lr: 0.000133  loss: 4.0365 (4.0365)  loss_classifier: 0.7135 (0.7135)  loss_box_reg: 0.2773 (0.2773)  loss_mask: 3.0243 (3.0243)  loss_objectness: 0.0091 (0.0091)  loss_rpn_box_reg: 0.0124 (0.0124)  time: 2.0047  data: 1.1275  max mem: 4477
Epoch: [0]  [20/40]  eta: 0:00:14  lr: 0.002695  loss: 0.9883 (1.4917)  loss_classifier: 0.2163 (0.2969)  loss_box_reg: 0.2675 (0.2757)  loss_mask: 0.3774 (0.8357)  loss_objectness: 0.0129 (0.0390)  loss_rpn_box_reg: 0.0197 (0.0443)  time: 0.6811  data: 0.0102  max mem: 4756
Epoch: [0]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.6886 (1.0935)  loss_classifier: 0.1147 (0.2084)  loss_box_reg: 0.2390 (0.2614)  loss_mask: 0.2367 (0.5609)  loss_objectness: 0.0106 (0.0255)  loss_rpn_box_reg: 0.0145 (0.0372)  time: 0.6678  data: 0.0102  max mem: 5186
Epoch: [0] Total time: 0:00:28 (0.7101 s / it)
Epoch: [1]  [ 0/40]  eta: 0:01:16  lr: 0.005000  loss: 0.5957 (0.5957)  loss_classifier: 0.0515 (0.0515)  loss_box_reg: 0.1563 (0.1563)  loss_mask: 0.3717 (0.3717)  loss_objectness: 0.0052 (0.0052)  loss_rpn_box_reg: 0.0109 (0.0109)  time: 1.9157  data: 1.1354  max mem: 5186
Epoch: [1]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.5525 (0.5504)  loss_classifier: 0.0842 (0.0874)  loss_box_reg: 0.1895 (0.1959)  loss_mask: 0.2161 (0.2374)  loss_objectness: 0.0075 (0.0097)  loss_rpn_box_reg: 0.0120 (0.0200)  time: 0.7117  data: 0.0182  max mem: 5186
Epoch: [1]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.4374 (0.5299)  loss_classifier: 0.0783 (0.0864)  loss_box_reg: 0.1562 (0.1836)  loss_mask: 0.1850 (0.2205)  loss_objectness: 0.0056 (0.0090)  loss_rpn_box_reg: 0.0118 (0.0305)  time: 0.6924  data: 0.0121  max mem: 5186
Epoch: [1] Total time: 0:00:29 (0.7340 s / it)
{% endraw %}

Now we can fully train the model for more epochs, in this case for 20 more.

{% raw %}
num_epochs = 20

data_loader, data_loader_test = get_dataset(
    "segmentation", batch_size=4, get_tensor_transforms=get_tensor_transforms
)
{% endraw %} {% raw %}
for epoch in range(1, num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)

    lr_scheduler.step()
Epoch: [1]  [ 0/40]  eta: 0:01:13  lr: 0.005000  loss: 0.3240 (0.3240)  loss_classifier: 0.0630 (0.0630)  loss_box_reg: 0.1141 (0.1141)  loss_mask: 0.1316 (0.1316)  loss_objectness: 0.0045 (0.0045)  loss_rpn_box_reg: 0.0107 (0.0107)  time: 1.8457  data: 1.1334  max mem: 5186
Epoch: [1]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3992 (0.4196)  loss_classifier: 0.0662 (0.0715)  loss_box_reg: 0.1295 (0.1393)  loss_mask: 0.1815 (0.1859)  loss_objectness: 0.0056 (0.0075)  loss_rpn_box_reg: 0.0141 (0.0155)  time: 0.6964  data: 0.0127  max mem: 5186
Epoch: [1]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3979 (0.4282)  loss_classifier: 0.0654 (0.0714)  loss_box_reg: 0.1271 (0.1398)  loss_mask: 0.1731 (0.1836)  loss_objectness: 0.0047 (0.0076)  loss_rpn_box_reg: 0.0075 (0.0257)  time: 0.6889  data: 0.0110  max mem: 5186
Epoch: [1] Total time: 0:00:28 (0.7232 s / it)
Epoch: [2]  [ 0/40]  eta: 0:01:13  lr: 0.005000  loss: 0.4541 (0.4541)  loss_classifier: 0.0680 (0.0680)  loss_box_reg: 0.1252 (0.1252)  loss_mask: 0.2360 (0.2360)  loss_objectness: 0.0086 (0.0086)  loss_rpn_box_reg: 0.0163 (0.0163)  time: 1.8309  data: 1.0924  max mem: 5186
Epoch: [2]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3629 (0.3645)  loss_classifier: 0.0530 (0.0580)  loss_box_reg: 0.1268 (0.1203)  loss_mask: 0.1568 (0.1564)  loss_objectness: 0.0027 (0.0040)  loss_rpn_box_reg: 0.0086 (0.0259)  time: 0.7032  data: 0.0117  max mem: 5186
Epoch: [2]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3992 (0.3820)  loss_classifier: 0.0589 (0.0609)  loss_box_reg: 0.1344 (0.1266)  loss_mask: 0.1803 (0.1666)  loss_objectness: 0.0039 (0.0052)  loss_rpn_box_reg: 0.0094 (0.0226)  time: 0.6926  data: 0.0106  max mem: 5186
Epoch: [2] Total time: 0:00:29 (0.7277 s / it)
Epoch: [3]  [ 0/40]  eta: 0:01:12  lr: 0.005000  loss: 0.3590 (0.3590)  loss_classifier: 0.0385 (0.0385)  loss_box_reg: 0.1003 (0.1003)  loss_mask: 0.1987 (0.1987)  loss_objectness: 0.0046 (0.0046)  loss_rpn_box_reg: 0.0170 (0.0170)  time: 1.8160  data: 1.0506  max mem: 5186
Epoch: [3]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3535 (0.3699)  loss_classifier: 0.0593 (0.0587)  loss_box_reg: 0.1167 (0.1249)  loss_mask: 0.1595 (0.1696)  loss_objectness: 0.0028 (0.0036)  loss_rpn_box_reg: 0.0070 (0.0130)  time: 0.7014  data: 0.0101  max mem: 5186
Epoch: [3]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3484 (0.3794)  loss_classifier: 0.0601 (0.0591)  loss_box_reg: 0.1123 (0.1202)  loss_mask: 0.1600 (0.1627)  loss_objectness: 0.0125 (0.0155)  loss_rpn_box_reg: 0.0072 (0.0219)  time: 0.6965  data: 0.0122  max mem: 5187
Epoch: [3] Total time: 0:00:29 (0.7283 s / it)
Epoch: [4]  [ 0/40]  eta: 0:01:08  lr: 0.005000  loss: 0.3141 (0.3141)  loss_classifier: 0.0439 (0.0439)  loss_box_reg: 0.0967 (0.0967)  loss_mask: 0.1453 (0.1453)  loss_objectness: 0.0172 (0.0172)  loss_rpn_box_reg: 0.0110 (0.0110)  time: 1.7100  data: 0.9552  max mem: 5187
Epoch: [4]  [20/40]  eta: 0:00:14  lr: 0.005000  loss: 0.3018 (0.3278)  loss_classifier: 0.0428 (0.0443)  loss_box_reg: 0.0901 (0.0994)  loss_mask: 0.1455 (0.1498)  loss_objectness: 0.0045 (0.0058)  loss_rpn_box_reg: 0.0076 (0.0284)  time: 0.6971  data: 0.0118  max mem: 5187
Epoch: [4]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3191 (0.3350)  loss_classifier: 0.0493 (0.0514)  loss_box_reg: 0.1104 (0.1076)  loss_mask: 0.1533 (0.1505)  loss_objectness: 0.0027 (0.0049)  loss_rpn_box_reg: 0.0095 (0.0207)  time: 0.6971  data: 0.0102  max mem: 5187
Epoch: [4] Total time: 0:00:28 (0.7240 s / it)
Epoch: [5]  [ 0/40]  eta: 0:01:14  lr: 0.005000  loss: 0.2519 (0.2519)  loss_classifier: 0.0284 (0.0284)  loss_box_reg: 0.0748 (0.0748)  loss_mask: 0.1220 (0.1220)  loss_objectness: 0.0042 (0.0042)  loss_rpn_box_reg: 0.0225 (0.0225)  time: 1.8564  data: 1.0900  max mem: 5187
Epoch: [5]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2967 (0.2996)  loss_classifier: 0.0421 (0.0452)  loss_box_reg: 0.0881 (0.1006)  loss_mask: 0.1271 (0.1374)  loss_objectness: 0.0027 (0.0034)  loss_rpn_box_reg: 0.0049 (0.0130)  time: 0.7039  data: 0.0101  max mem: 5187
Epoch: [5]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3243 (0.3194)  loss_classifier: 0.0430 (0.0460)  loss_box_reg: 0.1040 (0.1038)  loss_mask: 0.1448 (0.1472)  loss_objectness: 0.0016 (0.0033)  loss_rpn_box_reg: 0.0094 (0.0192)  time: 0.6919  data: 0.0104  max mem: 5187
Epoch: [5] Total time: 0:00:29 (0.7297 s / it)
Epoch: [6]  [ 0/40]  eta: 0:01:12  lr: 0.005000  loss: 0.3384 (0.3384)  loss_classifier: 0.0460 (0.0460)  loss_box_reg: 0.0972 (0.0972)  loss_mask: 0.1888 (0.1888)  loss_objectness: 0.0022 (0.0022)  loss_rpn_box_reg: 0.0042 (0.0042)  time: 1.8089  data: 1.0185  max mem: 5187
Epoch: [6]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2867 (0.3056)  loss_classifier: 0.0428 (0.0500)  loss_box_reg: 0.0924 (0.1065)  loss_mask: 0.1299 (0.1377)  loss_objectness: 0.0013 (0.0020)  loss_rpn_box_reg: 0.0054 (0.0094)  time: 0.7109  data: 0.0103  max mem: 5187
Epoch: [6]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3222 (0.3242)  loss_classifier: 0.0428 (0.0481)  loss_box_reg: 0.0937 (0.1045)  loss_mask: 0.1724 (0.1499)  loss_objectness: 0.0023 (0.0025)  loss_rpn_box_reg: 0.0084 (0.0191)  time: 0.6966  data: 0.0103  max mem: 5187
Epoch: [6] Total time: 0:00:29 (0.7320 s / it)
Epoch: [7]  [ 0/40]  eta: 0:01:13  lr: 0.005000  loss: 0.3195 (0.3195)  loss_classifier: 0.0596 (0.0596)  loss_box_reg: 0.1272 (0.1272)  loss_mask: 0.1275 (0.1275)  loss_objectness: 0.0020 (0.0020)  loss_rpn_box_reg: 0.0032 (0.0032)  time: 1.8336  data: 1.0330  max mem: 5187
Epoch: [7]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2632 (0.2951)  loss_classifier: 0.0369 (0.0413)  loss_box_reg: 0.0764 (0.0889)  loss_mask: 0.1331 (0.1402)  loss_objectness: 0.0011 (0.0035)  loss_rpn_box_reg: 0.0055 (0.0213)  time: 0.7028  data: 0.0122  max mem: 5187
Epoch: [7]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2442 (0.2848)  loss_classifier: 0.0335 (0.0403)  loss_box_reg: 0.0764 (0.0866)  loss_mask: 0.1283 (0.1386)  loss_objectness: 0.0018 (0.0031)  loss_rpn_box_reg: 0.0062 (0.0162)  time: 0.6923  data: 0.0101  max mem: 5187
Epoch: [7] Total time: 0:00:29 (0.7284 s / it)
Epoch: [8]  [ 0/40]  eta: 0:01:13  lr: 0.005000  loss: 0.1939 (0.1939)  loss_classifier: 0.0287 (0.0287)  loss_box_reg: 0.0565 (0.0565)  loss_mask: 0.1043 (0.1043)  loss_objectness: 0.0019 (0.0019)  loss_rpn_box_reg: 0.0025 (0.0025)  time: 1.8410  data: 1.0666  max mem: 5187
Epoch: [8]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2525 (0.2689)  loss_classifier: 0.0372 (0.0374)  loss_box_reg: 0.0747 (0.0792)  loss_mask: 0.1225 (0.1324)  loss_objectness: 0.0016 (0.0020)  loss_rpn_box_reg: 0.0074 (0.0180)  time: 0.7091  data: 0.0135  max mem: 5187
Epoch: [8]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2707 (0.2727)  loss_classifier: 0.0399 (0.0396)  loss_box_reg: 0.0821 (0.0825)  loss_mask: 0.1322 (0.1348)  loss_objectness: 0.0016 (0.0019)  loss_rpn_box_reg: 0.0061 (0.0138)  time: 0.6923  data: 0.0102  max mem: 5187
Epoch: [8] Total time: 0:00:29 (0.7303 s / it)
Epoch: [9]  [ 0/40]  eta: 0:01:12  lr: 0.005000  loss: 0.2287 (0.2287)  loss_classifier: 0.0303 (0.0303)  loss_box_reg: 0.0830 (0.0830)  loss_mask: 0.1129 (0.1129)  loss_objectness: 0.0006 (0.0006)  loss_rpn_box_reg: 0.0018 (0.0018)  time: 1.8099  data: 0.9906  max mem: 5187
Epoch: [9]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2514 (0.2740)  loss_classifier: 0.0354 (0.0360)  loss_box_reg: 0.0778 (0.0801)  loss_mask: 0.1258 (0.1267)  loss_objectness: 0.0011 (0.0018)  loss_rpn_box_reg: 0.0041 (0.0294)  time: 0.7077  data: 0.0122  max mem: 5187
Epoch: [9]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2717 (0.2729)  loss_classifier: 0.0343 (0.0366)  loss_box_reg: 0.0800 (0.0830)  loss_mask: 0.1285 (0.1323)  loss_objectness: 0.0012 (0.0016)  loss_rpn_box_reg: 0.0040 (0.0193)  time: 0.6943  data: 0.0104  max mem: 5187
Epoch: [9] Total time: 0:00:29 (0.7302 s / it)
Epoch: [10]  [ 0/40]  eta: 0:01:19  lr: 0.005000  loss: 0.2613 (0.2613)  loss_classifier: 0.0453 (0.0453)  loss_box_reg: 0.1038 (0.1038)  loss_mask: 0.1081 (0.1081)  loss_objectness: 0.0006 (0.0006)  loss_rpn_box_reg: 0.0035 (0.0035)  time: 1.9970  data: 1.2734  max mem: 5187
Epoch: [10]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2293 (0.2367)  loss_classifier: 0.0322 (0.0330)  loss_box_reg: 0.0634 (0.0697)  loss_mask: 0.1356 (0.1278)  loss_objectness: 0.0010 (0.0014)  loss_rpn_box_reg: 0.0040 (0.0050)  time: 0.6998  data: 0.0097  max mem: 5187
Epoch: [10]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2614 (0.2522)  loss_classifier: 0.0347 (0.0353)  loss_box_reg: 0.0748 (0.0752)  loss_mask: 0.1176 (0.1273)  loss_objectness: 0.0017 (0.0019)  loss_rpn_box_reg: 0.0070 (0.0124)  time: 0.6966  data: 0.0101  max mem: 5187
Epoch: [10] Total time: 0:00:29 (0.7323 s / it)
Epoch: [11]  [ 0/40]  eta: 0:01:24  lr: 0.000500  loss: 0.1697 (0.1697)  loss_classifier: 0.0359 (0.0359)  loss_box_reg: 0.0517 (0.0517)  loss_mask: 0.0787 (0.0787)  loss_objectness: 0.0018 (0.0018)  loss_rpn_box_reg: 0.0017 (0.0017)  time: 2.1095  data: 1.3535  max mem: 5187
Epoch: [11]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2105 (0.2454)  loss_classifier: 0.0323 (0.0345)  loss_box_reg: 0.0632 (0.0692)  loss_mask: 0.1108 (0.1198)  loss_objectness: 0.0010 (0.0013)  loss_rpn_box_reg: 0.0045 (0.0206)  time: 0.7076  data: 0.0092  max mem: 5187
Epoch: [11]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2302 (0.2438)  loss_classifier: 0.0310 (0.0353)  loss_box_reg: 0.0638 (0.0696)  loss_mask: 0.1188 (0.1223)  loss_objectness: 0.0009 (0.0022)  loss_rpn_box_reg: 0.0042 (0.0144)  time: 0.6948  data: 0.0102  max mem: 5187
Epoch: [11] Total time: 0:00:29 (0.7373 s / it)
Epoch: [12]  [ 0/40]  eta: 0:01:21  lr: 0.000500  loss: 0.2272 (0.2272)  loss_classifier: 0.0428 (0.0428)  loss_box_reg: 0.0654 (0.0654)  loss_mask: 0.1115 (0.1115)  loss_objectness: 0.0035 (0.0035)  loss_rpn_box_reg: 0.0041 (0.0041)  time: 2.0277  data: 1.2477  max mem: 5187
Epoch: [12]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2160 (0.2305)  loss_classifier: 0.0292 (0.0326)  loss_box_reg: 0.0524 (0.0655)  loss_mask: 0.1111 (0.1182)  loss_objectness: 0.0014 (0.0019)  loss_rpn_box_reg: 0.0052 (0.0125)  time: 0.7012  data: 0.0090  max mem: 5187
Epoch: [12]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2176 (0.2294)  loss_classifier: 0.0304 (0.0311)  loss_box_reg: 0.0541 (0.0634)  loss_mask: 0.1220 (0.1243)  loss_objectness: 0.0008 (0.0017)  loss_rpn_box_reg: 0.0032 (0.0090)  time: 0.6938  data: 0.0102  max mem: 5187
Epoch: [12] Total time: 0:00:29 (0.7321 s / it)
Epoch: [13]  [ 0/40]  eta: 0:01:10  lr: 0.000500  loss: 0.2678 (0.2678)  loss_classifier: 0.0322 (0.0322)  loss_box_reg: 0.0697 (0.0697)  loss_mask: 0.1598 (0.1598)  loss_objectness: 0.0005 (0.0005)  loss_rpn_box_reg: 0.0056 (0.0056)  time: 1.7562  data: 1.0143  max mem: 5187
Epoch: [13]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2001 (0.2239)  loss_classifier: 0.0298 (0.0306)  loss_box_reg: 0.0558 (0.0597)  loss_mask: 0.1128 (0.1226)  loss_objectness: 0.0011 (0.0015)  loss_rpn_box_reg: 0.0036 (0.0095)  time: 0.7080  data: 0.0115  max mem: 5187
Epoch: [13]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2180 (0.2270)  loss_classifier: 0.0286 (0.0310)  loss_box_reg: 0.0541 (0.0616)  loss_mask: 0.1215 (0.1244)  loss_objectness: 0.0008 (0.0017)  loss_rpn_box_reg: 0.0032 (0.0083)  time: 0.6971  data: 0.0105  max mem: 5187
Epoch: [13] Total time: 0:00:29 (0.7315 s / it)
Epoch: [14]  [ 0/40]  eta: 0:01:11  lr: 0.000500  loss: 0.2968 (0.2968)  loss_classifier: 0.0460 (0.0460)  loss_box_reg: 0.0902 (0.0902)  loss_mask: 0.1451 (0.1451)  loss_objectness: 0.0021 (0.0021)  loss_rpn_box_reg: 0.0135 (0.0135)  time: 1.7971  data: 1.0255  max mem: 5187
Epoch: [14]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2098 (0.2219)  loss_classifier: 0.0309 (0.0305)  loss_box_reg: 0.0587 (0.0610)  loss_mask: 0.1193 (0.1243)  loss_objectness: 0.0007 (0.0015)  loss_rpn_box_reg: 0.0031 (0.0046)  time: 0.7179  data: 0.0133  max mem: 5188
Epoch: [14]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2363 (0.2275)  loss_classifier: 0.0317 (0.0313)  loss_box_reg: 0.0625 (0.0626)  loss_mask: 0.1255 (0.1236)  loss_objectness: 0.0010 (0.0015)  loss_rpn_box_reg: 0.0041 (0.0084)  time: 0.6857  data: 0.0111  max mem: 5188
Epoch: [14] Total time: 0:00:29 (0.7303 s / it)
Epoch: [15]  [ 0/40]  eta: 0:01:07  lr: 0.000500  loss: 0.2398 (0.2398)  loss_classifier: 0.0247 (0.0247)  loss_box_reg: 0.0634 (0.0634)  loss_mask: 0.1471 (0.1471)  loss_objectness: 0.0025 (0.0025)  loss_rpn_box_reg: 0.0021 (0.0021)  time: 1.6940  data: 0.9901  max mem: 5188
Epoch: [15]  [20/40]  eta: 0:00:14  lr: 0.000500  loss: 0.2037 (0.2282)  loss_classifier: 0.0287 (0.0316)  loss_box_reg: 0.0585 (0.0610)  loss_mask: 0.1130 (0.1244)  loss_objectness: 0.0008 (0.0012)  loss_rpn_box_reg: 0.0029 (0.0100)  time: 0.6760  data: 0.0099  max mem: 5188
Epoch: [15]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2024 (0.2239)  loss_classifier: 0.0293 (0.0315)  loss_box_reg: 0.0499 (0.0602)  loss_mask: 0.1131 (0.1228)  loss_objectness: 0.0010 (0.0014)  loss_rpn_box_reg: 0.0053 (0.0080)  time: 0.6658  data: 0.0098  max mem: 5188
Epoch: [15] Total time: 0:00:27 (0.6981 s / it)
Epoch: [16]  [ 0/40]  eta: 0:01:16  lr: 0.000500  loss: 0.2110 (0.2110)  loss_classifier: 0.0317 (0.0317)  loss_box_reg: 0.0642 (0.0642)  loss_mask: 0.1093 (0.1093)  loss_objectness: 0.0004 (0.0004)  loss_rpn_box_reg: 0.0054 (0.0054)  time: 1.9114  data: 1.2382  max mem: 5188
Epoch: [16]  [20/40]  eta: 0:00:14  lr: 0.000500  loss: 0.2050 (0.2101)  loss_classifier: 0.0296 (0.0295)  loss_box_reg: 0.0495 (0.0552)  loss_mask: 0.1156 (0.1148)  loss_objectness: 0.0012 (0.0013)  loss_rpn_box_reg: 0.0024 (0.0093)  time: 0.6916  data: 0.0107  max mem: 5188
Epoch: [16]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2408 (0.2212)  loss_classifier: 0.0313 (0.0308)  loss_box_reg: 0.0650 (0.0604)  loss_mask: 0.1275 (0.1207)  loss_objectness: 0.0005 (0.0013)  loss_rpn_box_reg: 0.0042 (0.0079)  time: 0.6869  data: 0.0128  max mem: 5188
Epoch: [16] Total time: 0:00:28 (0.7213 s / it)
Epoch: [17]  [ 0/40]  eta: 0:01:17  lr: 0.000500  loss: 0.2686 (0.2686)  loss_classifier: 0.0413 (0.0413)  loss_box_reg: 0.0751 (0.0751)  loss_mask: 0.1479 (0.1479)  loss_objectness: 0.0002 (0.0002)  loss_rpn_box_reg: 0.0042 (0.0042)  time: 1.9448  data: 1.1663  max mem: 5188
Epoch: [17]  [20/40]  eta: 0:00:14  lr: 0.000500  loss: 0.1951 (0.2127)  loss_classifier: 0.0270 (0.0282)  loss_box_reg: 0.0537 (0.0569)  loss_mask: 0.1166 (0.1219)  loss_objectness: 0.0007 (0.0009)  loss_rpn_box_reg: 0.0029 (0.0048)  time: 0.6736  data: 0.0117  max mem: 5188
Epoch: [17]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.1986 (0.2179)  loss_classifier: 0.0245 (0.0285)  loss_box_reg: 0.0522 (0.0586)  loss_mask: 0.1106 (0.1218)  loss_objectness: 0.0008 (0.0011)  loss_rpn_box_reg: 0.0058 (0.0079)  time: 0.6933  data: 0.0126  max mem: 5188
Epoch: [17] Total time: 0:00:28 (0.7174 s / it)
Epoch: [18]  [ 0/40]  eta: 0:01:18  lr: 0.000500  loss: 0.1237 (0.1237)  loss_classifier: 0.0171 (0.0171)  loss_box_reg: 0.0251 (0.0251)  loss_mask: 0.0800 (0.0800)  loss_objectness: 0.0004 (0.0004)  loss_rpn_box_reg: 0.0011 (0.0011)  time: 1.9625  data: 1.2571  max mem: 5188
Epoch: [18]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2090 (0.2138)  loss_classifier: 0.0277 (0.0288)  loss_box_reg: 0.0619 (0.0566)  loss_mask: 0.1152 (0.1205)  loss_objectness: 0.0008 (0.0015)  loss_rpn_box_reg: 0.0043 (0.0064)  time: 0.6991  data: 0.0117  max mem: 5188
Epoch: [18]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2212 (0.2143)  loss_classifier: 0.0274 (0.0290)  loss_box_reg: 0.0540 (0.0553)  loss_mask: 0.1178 (0.1206)  loss_objectness: 0.0009 (0.0015)  loss_rpn_box_reg: 0.0038 (0.0079)  time: 0.6903  data: 0.0128  max mem: 5188
Epoch: [18] Total time: 0:00:29 (0.7276 s / it)
Epoch: [19]  [ 0/40]  eta: 0:01:10  lr: 0.000500  loss: 0.2665 (0.2665)  loss_classifier: 0.0379 (0.0379)  loss_box_reg: 0.0808 (0.0808)  loss_mask: 0.1417 (0.1417)  loss_objectness: 0.0013 (0.0013)  loss_rpn_box_reg: 0.0048 (0.0048)  time: 1.7637  data: 1.0465  max mem: 5188
Epoch: [19]  [20/40]  eta: 0:00:14  lr: 0.000500  loss: 0.1901 (0.2097)  loss_classifier: 0.0256 (0.0292)  loss_box_reg: 0.0479 (0.0558)  loss_mask: 0.1156 (0.1198)  loss_objectness: 0.0006 (0.0012)  loss_rpn_box_reg: 0.0026 (0.0037)  time: 0.6958  data: 0.0129  max mem: 5188
Epoch: [19]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2163 (0.2146)  loss_classifier: 0.0274 (0.0300)  loss_box_reg: 0.0538 (0.0563)  loss_mask: 0.1203 (0.1197)  loss_objectness: 0.0010 (0.0015)  loss_rpn_box_reg: 0.0053 (0.0071)  time: 0.6859  data: 0.0117  max mem: 5188
Epoch: [19] Total time: 0:00:28 (0.7189 s / it)
{% endraw %}

Calculate metrics

Visualise few samples and print the IOU metric for those samples

{% raw %}
from dolphins_recognition_challenge.instance_segmentation.model import show_prediction, iou_metric_example

for i in range(4):
    iou_test_image = iou_metric_example(model, data_loader_test.dataset[i], 0.5)
    img, _ = data_loader_test.dataset[i]
    print(f"IOU metric for the input image is: {iou_test_image}")
    show_prediction(model, img, width=820)
IOU metric for the input image is: 0.611417860986359
IOU metric for the input image is: 0.6266646179274642
IOU metric for the input image is: 0.49453726340315735
IOU metric for the input image is: 0.5125102391690628
{% endraw %}

Calculate the mean IOU metric for the entire data set

{% raw %}
%%time

from dolphins_recognition_challenge.instance_segmentation.model import iou_metric, show_predictions_sorted_by_iou

mean_iou_testset, _ = iou_metric(model, data_loader_test.dataset)

print(f"Mean IOU metric for the test set is: {mean_iou_testset}")
Mean IOU metric for the test set is: 0.4572200769277805
CPU times: user 11.1 s, sys: 21.2 ms, total: 11.1 s
Wall time: 7.1 s
{% endraw %}

...

{% raw %}
show_predictions_sorted_by_iou(model, data_loader_test.dataset)
IOU metric: 0.21098597787540233
IOU metric: 0.2232096224908163
IOU metric: 0.2512907010096409
IOU metric: 0.2766092920159012
IOU metric: 0.31027277114326873
IOU metric: 0.3237450562303182
IOU metric: 0.33076757255564115
IOU metric: 0.33328621638503103
IOU metric: 0.36783285284834627
IOU metric: 0.3693530331366992
IOU metric: 0.3765179274671962
IOU metric: 0.37857096487455416
IOU metric: 0.391026799403781
IOU metric: 0.39555250206320397
IOU metric: 0.3962583467560526
IOU metric: 0.40203977087818266
IOU metric: 0.4181012870137721
IOU metric: 0.4206897893475901
IOU metric: 0.43016028679887897
IOU metric: 0.4385907676718781
IOU metric: 0.45471250788992856
IOU metric: 0.46188859946093164
IOU metric: 0.4756197941968153
IOU metric: 0.49453726340315735
IOU metric: 0.4960627464597811
IOU metric: 0.5122488726309984
IOU metric: 0.5125102391690628
IOU metric: 0.5155329752183607
IOU metric: 0.5251719540545396
IOU metric: 0.5573696303945055
IOU metric: 0.5738050562126649
IOU metric: 0.5869603948990534
IOU metric: 0.611417860986359
IOU metric: 0.6266646179274642
IOU metric: 0.637678156040616
IOU metric: 0.6507437603698991
IOU metric: 0.6708286594422596
IOU metric: 0.6766911894228075
IOU metric: 0.7462771840380762
{% endraw %}

Submit solution

Here we can see how to use the submit_model function. We must pass trained model, an alias that will be displayed on the leaderboard, name and email. Returns the path to the zipped file.

{% raw %}
from dolphins_recognition_challenge.submissions import submit_model

zip_fname = submit_model(model, alias="dolphin123", name="Name Surname", email="name.surname@gmail.com")
{% endraw %}

Here we can check what is in the zip file. The zip file contains the model and 2 csv files. The first CSV file contains the iou metrics for each image from the validation set, and the second file contains information about the competitor.

{% raw %}
!unzip -vl "{zip_fname}"
Archive:  submission-iou=0.45722-dolphin123-name.surname@gmail.com-2021-01-04T09:17:55.902567.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    3356  Stored     3356   0% 2021-01-04 09:17 9cb3fed7  submission-iou=0.45722-dolphin123-name.surname@gmail.com-2021-01-04T09:17:55.902567/metrics.csv
176247136  Stored 176247136   0% 2021-01-04 09:17 136b4c88  submission-iou=0.45722-dolphin123-name.surname@gmail.com-2021-01-04T09:17:55.902567/model.pt
      93  Stored       93   0% 2021-01-04 09:17 30e1f561  submission-iou=0.45722-dolphin123-name.surname@gmail.com-2021-01-04T09:17:55.902567/info.csv
--------          -------  ---                            -------
176250585         176250585   0%                            3 files
{% endraw %}